library(gapminder)
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0.9000 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(forcats)
library(ggplot2)
library(here)
## here() starts at /Users/SarahDada/Desktop/Gitgood/stat545-hw-sarahdada
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Assignment 1:
The here package seems to do two things that are exceptional: It can be used among different operating systems without difficulty, and it can detect the root directory.
It allows other people to run your code. It works with mac and windows, so it is platform independent. Another user doesn’t have to set up the code so that the data has an absolute path in their directory; it knows where the file is based on the R project. Additionally, if you set up the files outside of the R studio project, they still run. It knows where the file is, again based on the R project.
Unlike setwd, here can direct to a root directory, which means that it knows where the file is stored. It also will write directly to a file, regardless of that path. This means that if a path is changed to the file, it should still be able to find the directory. It is not an absolute path. As well, others can run the code on their computer, instead of being dependent on your local path. Also, managing subdirectories is easier, because you aren’t setting an absolute path. It knows the directory based on the R project file.
Assignment 2:
I used class to determine it was a factor, and string to determine that it was a factor with five variables.
I have filtered to remove Oceania using != , rather than just naming all of my continents of interest. Using nlevels, I can see that there are still five factors in the data set, despite filtering out Oceania. I wanted to remove the factors, so I used drop(levels). Upon looking at the levels again, using nlevels, I can see that there are only four; one for each continent.
class(gapminder$continent)
## [1] "factor"
str(gapminder$continent)
## Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
nlevels(gapminder$continent)
## [1] 5
no_oceania_gap <- gapminder %>%
filter(continent!= "Oceania")
nlevels(no_oceania_gap$continent)
## [1] 5
dropped_oceania<- no_oceania_gap %>%
droplevels()
nlevels(dropped_oceania$continent)
## [1] 4
#The Median life Expectancy of all continents except Oceania, before releveling.
no_oceania_gap%>%
group_by(continent) %>%
summarize(medianLifeExp = median(lifeExp))%>%
ggplot() +
geom_col(aes(continent,medianLifeExp)) +
coord_flip()+
theme_bw() +
ylab("Median Life Expectancy") + xlab("Continent")
#Reordered version of oceania-free gapminder, based on the life expectancy in years, from highest to lowest.
no_oceania_gap%>%
group_by(continent) %>%
summarize(medianLifeExp = median(lifeExp))%>%
ggplot() +
geom_col(aes(fct_reorder(continent, medianLifeExp, min), y=medianLifeExp)) +
coord_flip()+
theme_bw() +
ylab("Median Life Expectancy") + xlab("Continent")
Assignment 3: I filtered to the gapminder data set in year 2007, and then grouped country. I summarized the median life expectancy of each country. I wrote the csv to my HW05 folder, and named it “CountryLifeExp.csv”. When I read out the data, I read it out as “goodtime”, because I thought it was a fun name.
CountryLife<-(gapminder %>%
filter(year==2007) %>%
group_by(country) %>%
summarize(medianLifeExp = median(lifeExp))
)
write_csv(CountryLife, here::here("HW05", "CountryLifeExp.csv"))
goodtime<-read_csv(here("HW05", "CountryLifeExp.csv"))
## Parsed with column specification:
## cols(
## country = col_character(),
## medianLifeExp = col_double()
## )
goodtime
## # A tibble: 142 x 2
## country medianLifeExp
## <chr> <dbl>
## 1 Afghanistan 43.8
## 2 Albania 76.4
## 3 Algeria 72.3
## 4 Angola 42.7
## 5 Argentina 75.3
## 6 Australia 81.2
## 7 Austria 79.8
## 8 Bahrain 75.6
## 9 Bangladesh 64.1
## 10 Belgium 79.4
## # … with 132 more rows
Assignment 4: My previous work showed the number of countries per continent in year 2007, in black and white. My title and subtitle were not showing.
Continentcount<-(gapminder %>%
filter(year==2007) %>%
count(continent))
Continentcount %>%
ggplot(aes(x = continent, y = n))+
geom_bar(stat = "identity") +
labs(x = "Continent",y = "Number of Countries")
title = "Number of countries in each Continent"
subtitle = "(based on 2007 gapminder data)"
New graph: I changed the bar graph so that each bar has a different colour to make it distinct. I removed the distracting grey background but kept the lines, so that you could still determine the number of countries, but it is more clear and clean. It was suggested in class to entirely remove the lines, but then you could not determine the number of countries. This was an in-between. I removed the x axis because it was redundant (those are evidently continents), but decided to keep the y axis, and put in a better title which properly explained the x and y axis. This makes the plot cleaner. The subtitle tells us where the data came from, however the subtitle dissappears when I add the “plotly”" interactive component. I used “plotly”" to make it interactive, so you can see data about how many countries are in a continent (denoted “n” when hovering) by hovering over the bar. I made this a seperate graph, so that one graph has the subtitle and is not interactive, and the other is an html widget.
Continentcount<-(gapminder %>%
filter(year==2007) %>%
count(continent))
PrePlotlyContinent <- Continentcount %>%
ggplot(aes(x = continent, y = n, fill=continent))+
geom_bar(stat = "identity") +
labs(x="",
y="Number of Countries",
title = "Number of Countries per Continent", subtitle= "Data taken from year 2007 of the Gapminder data set"
) + theme_bw()
PrePlotlyContinent
PrePlotlyContinent %>% ggplotly()
Assignment 5
I named the path to Homework 5 “p” and saved my.png in the HW05 folder. I could also save changes through that folder as well. I changed the Height, width, and units in “PrePlotlyContinent2”
p<- here::here ("HW05")
ggsave(here("HW05", "PrePlotlyContinent.png"))
## Saving 7 x 5 in image
ggsave("PrePlotlyContinent4.png", path=p, width = 9, height = 9, units="in")
#I made the plot larger
ggsave("PrePlotlyContinentSmaller.png", path=p, width = 4, height = 4, units="in")
#I made the plot smaller
ggsave("PrePlotlyContinentDPIsmall.png", path=p, width = 4, height = 4, units="in", dpi=100)
#I reduced the raster graphics using DPI. I needed to use path=p, otherwise it does not know where to save to.
ggsave("PrePlotlyContinentDPIhigh.png", path=p, width = 4, height = 4, units="in", dpi=400)
#The higher DPI had better resolution.